Extended Comment on Language Trees and Zipping

نویسنده

  • Joshua Goodman
چکیده

This is the extended version of a Comment submitted to Physical Review Letters. I first point out the inappropriateness of publishing a Letter unrelated to physics. Next, I give experimental results showing that the technique used in the Letter is 3 times worse and 17 times slower than a simple baseline. And finally, I review the literature, showing that the ideas of the Letter are not novel. I conclude by suggesting that Physical Review Letters should not publish Letters unrelated to physics. A recent Letter to Physical Review Letters, “Language Trees and Zipping,” by Benedetto et al. (2002) (available at http://babbage.sissa.it/abs/cond-mat/ 0108530) is flawed in several ways. First of all, the Letter had nothing to do with physics, and instead belonged in a computer science journal, if it deserved to be published at all. Second of all, the actual results are unimpressive: as I will show, the techniques used lead to 3 times as many errors and are 17 times slower than a very simple baseline model applied to a standard, similar problem. Finally, the ideas in the Letter are not even novel: they are well known to those in several areas of computer science. The actual paper is clearly unrelated to physics, and much more closely related to areas of computer science such as Computational Linguistics and Machine Learning, as can be seen simply by reading the abstract of their paper, which I include here: In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comment on "Language Trees and Zipping"

This is the extended version of a Comment submitted to Physical Review Letters. I first point out the inappropriateness of publishing a Letter unrelated to physics. Next, I give experimental results showing that the technique used in the Letter is 3 times worse and 17 times slower than a simple baseline. And finally, I review the literature, showing that the ideas of the Letter are not novel. I...

متن کامل

Comment on"Language Trees and Zipping"arXiv:cond-mat/0108530

every encoding has priori information if the encoding represents any semantic information of the unverse or object.Encoding means mapping from the unverse to the string or strings of digits. The semantic here is used in the model-theoretic sense or denotation of the object.if encoding or strings of symbols is the adequate and true mapping of model or object,and the mapping is recursive or compu...

متن کامل

Language trees and zipping.

In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly acc...

متن کامل

Peer Reviewers’ Comments on Research Articles Submitted by Iranian Researchers

The invisible hands of peer reviewers play a determining role in the eventual fate of submissions to international English-medium journals. This study builds on the assumption that non-native researchers and prospective academic authors may find the whole strive for publication, and more specifically, the tough review process, less threatening if they are aware of journal reviewers’ expectation...

متن کامل

Health Rights and Realization; Comment on “Rights Language in the Sustainable Development Agenda: Has Right to Health Discourse and Norms Shaped Health Goals?”

In their hypothesis published in IJHPM, Lisa Forman and colleagues examined the prominence of the right to health and sexual and reproductive health rights (as well as related language) in four of the key reports that fed into the process of negotiating the Sustainable Development Goals (SDGs). Now that the SDGs have been formally adopted, this comment builds on some of the insights of Forman a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cond-mat/0202383  شماره 

صفحات  -

تاریخ انتشار 2002